The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data

نویسندگان

  • Jose Eduardo de la Torre-Bárcena
  • Sergios-Orestis Kolokotronis
  • Ernest K. Lee
  • Dennis Wm. Stevenson
  • Eric D. Brenner
  • Manpreet S. Katari
  • Gloria M. Coruzzi
  • Rob DeSalle
چکیده

BACKGROUND Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group. METHODOLOGY We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations. CONCLUSIONS We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random roots and lineage sorting.

Lineage sorting has been suggested as a major force in generating incongruent phylogenetic signal when multiple gene partitions are examined. The degree of lineage sorting can be estimated using the coalescent process and simulation studies have also pointed to a major role for incomplete lineage sorting as a factor in phylogenetic inference. Some recent empirical studies point to an extreme ro...

متن کامل

Genome-Wide Association Study of Seedling Characteristics in Bread Wheat Cultivars Under Normal and Salt Stress Conditions

In order to identify loci controlling seedling morpho-physiologic characteristics in 88 bread wheat cultivars, a greenhouse experiment based on simple alpha lattice was conducted under both normal and 120 mM (12 ds/m) salt stress condition of the Faculty of Agriculture, Urmia University in 2020-2021 cropping season. Chlorophyll a, b and carotenoid content, proline, plant fresh and dry weight, p...

متن کامل

The Impact of Different Genetic Architectures on Accuracy of Genomic Selection Using Three Bayesian Methods

Genome-wide evaluation uses the associations of a large number of single nucleotide polymorphism (SNP) markers across the whole genome and then combines the statistical methods with genomic data to predict the genetic values. Genomic predictions relieson linkage disequilibrium (LD) between genetic markers and quantitative trait loci (QTL) in a population. Methods that use all markers simultaneo...

متن کامل

A Phylogenetic Study of Arecaceae Based on Seedling Morphological and Anatomical Data

A morphological and anatomical survey was carried out of seedlings of 62 taxa of palms representing all major groups. The data were analyzed using cladistic parsimony analysis. Seedling data were analyzed independently and combined with adult morphological data. Outgroup selection was made within the family using the calamoids and Nypa fruticans; outside the family, the monocot family Dasypogon...

متن کامل

Molecular phylogeny of three desert truffles from Iran based on ribosomal genome

The ITS region including the 5.8S gene of rDNA of three desert truffle species were amplified using ITS4 and ITS1 primers. The ITS sequences were compared to those of other related authentic sequences obtained from GenBank. Among 12 specimens studied, seven isolates corresponded to Terfezia claveryi reported by other authors. Iranian T. claveryi specimens had an average similarity of 99.4% (ran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PLoS ONE

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2009